-
Notifications
You must be signed in to change notification settings - Fork 24
CREST Adapter #807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
CREST Adapter #807
Conversation
Fixed Arkane to check LoT by ignoring the year suffix (plans to implement year in the input) AutoTST does not attempt to import autotst (due to its reliance on RMG-Py env).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a CREST adapter to generate transition state (TS) conformer guesses for H_Abstraction reactions, complementing the existing heuristic approach. The integration includes CREST path detection, conda environment setup, and job submission/monitoring workflows. Several installation scripts were improved with better environment management, PATH/PYTHONPATH handling, and more robust activation/deactivation hooks. The Arkane module gained flexible model chemistry matching that handles hyphenated method names and year suffixes.
Key Changes:
- New CREST adapter module (
arc/job/adapters/ts/crest.py) integrating constrained TS conformer searches into the heuristics workflow - Enhanced Arkane model chemistry lookup with normalization and year-aware matching for RMG quantum corrections database
- Improved installation scripts with better conda environment isolation and conflict resolution (AutoTST/RMG-Py, GCN)
Reviewed changes
Copilot reviewed 19 out of 20 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
arc/job/adapters/ts/crest.py |
New CREST adapter with job setup, submission, monitoring, and H-abstraction atom detection |
arc/job/adapters/ts/heuristics.py |
Integrates CREST TS search alongside heuristics, submits jobs and processes results |
arc/job/adapters/ts/crest_test.py |
Unit tests for CREST input file generation |
arc/statmech/arkane.py |
Adds flexible matching for LevelOfTheory strings, normalizing hyphens and handling year suffixes |
arc/species/converter.py |
New reorder_xyz_string function for coordinate format/unit conversion with deprecation wrapper |
arc/species/converter_test.py |
Tests for the new XYZ reordering function |
arc/settings/settings.py |
CREST executable discovery across conda/mamba/micromamba environments and standalone builds |
devtools/install_crest.sh |
New installer for CREST 2.12 via conda |
devtools/install_autotst.sh |
Enhanced PATH management to isolate AutoTST from RMG-Py conflicts |
devtools/install_rmg.sh |
Improved PATH/PYTHONPATH additions with duplicate prevention |
devtools/install_gcn.sh |
Fixed activation hooks and switched to conda run for pip installs |
devtools/install_pyrdl.sh |
Corrected cmake installation to target arc_env explicitly |
devtools/install_torchani.sh |
Added LOGFILE variable for cleaner log handling |
devtools/install_all.sh |
Added --no-rmg flag, reordered PyRDL after ARC, added CREST to external tools |
devtools/crest_environment.yml |
Conda environment specification for CREST 2.12 |
Makefile |
Added install-crest target |
AGENTS.md |
Repository guidelines file (added but also gitignored - should be removed) |
.gitignore |
Ignores AGENTS.md |
arc/species/species.py |
Added logging for similar TS guesses during clustering |
arc/job/adapters/ts/autotst_ts.py |
Better error handling for missing AutoTST package in subprocess |
Comments suppressed due to low confidence (1)
arc/job/adapters/ts/heuristics.py:289
- When CREST generates TS guesses, they will have different timing information than the heuristics guesses, but all guesses (both Heuristics and CREST) are being assigned
tsg.t0andtsg.execution_timefrom the heuristics TSGuess object. CREST guesses should have their own timing information, or at minimum the execution_time should reflect the actual CREST job duration, not the heuristics generation time.
t0=tsg.t0,
execution_time=tsg.execution_time,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
arc/job/adapters/ts/crest.py
Outdated
| atom for atom in closest_atoms if not atom.startswith("H") | ||
| ) | ||
| if_hydrogen_present = any( | ||
| atom | ||
| for atom in closest_atoms |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in get_h_abs_atoms is incorrect. Line 305-312 checks any(atom for atom in closest_atoms if not atom.startswith("H")) which iterates over keys of closest_atoms dict (strings like "H0", "C1"), not the values. It should check any(atom for atom in atom_neighbours if not atom.startswith("H")) to check the actual neighbors of the hydrogen. The same issue exists on lines 308-312 for if_hydrogen_present.
| atom for atom in closest_atoms if not atom.startswith("H") | |
| ) | |
| if_hydrogen_present = any( | |
| atom | |
| for atom in closest_atoms | |
| atom for atom in atom_neighbours if not atom.startswith("H") | |
| ) | |
| if_hydrogen_present = any( | |
| atom | |
| for atom in atom_neighbours |
| "H": extract_digits(best_occurrence["H"]), | ||
| "A": extract_digits(best_occurrence["A"]), | ||
| "B": extract_digits(best_occurrence["B"]), | ||
| } |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When len(condition_occurrences) == 1, the function should return the single occurrence but doesn't. After the if len(condition_occurrences) > 1: block completes at line 341, control falls through to the else block at line 342. The single occurrence case needs to be handled explicitly, or the logic restructured to return condition_occurrences[0] when there's exactly one match.
| } | |
| } | |
| elif len(condition_occurrences) == 1: | |
| occurrence = condition_occurrences[0] | |
| return { | |
| "H": extract_digits(occurrence["H"]), | |
| "A": extract_digits(occurrence["A"]), | |
| "B": extract_digits(occurrence["B"]), | |
| } |
| Args: | ||
| xyz_str (str): The string xyz format to be converted. | ||
| reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates. |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring is missing the units parameter which is part of the function signature. This parameter should be documented to specify the units of the input coordinates (either 'angstrom' or 'bohr').
| reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates. | |
| reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates. | |
| units (str, optional): The units of the input coordinates (either 'angstrom' or 'bohr'). |
| } | ||
| else: | ||
|
|
||
| # Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hyodrogen atom has the lowest distance aggregate |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: "Hyodrogen" should be "Hydrogen".
| # Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hyodrogen atom has the lowest distance aggregate | |
| # Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hydrogen atom has the lowest distance aggregate |
devtools/install_autotst.sh
Outdated
| autotst_on () { | ||
| export AUTOTST_ROOT="__AUTOTST_PATH__" | ||
| export AUTOTST_OLD_PATH="$PATH" | ||
| export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}" | ||
| _strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; } |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _strip_path function is defined inside the heredoc with double-quoted variables, but the function body uses single quotes for string operations. This creates a fragile quoting situation. Consider defining this as a regular shell function outside the heredoc, or ensure consistent quoting to avoid unexpected variable expansion issues.
| autotst_on () { | |
| export AUTOTST_ROOT="__AUTOTST_PATH__" | |
| export AUTOTST_OLD_PATH="$PATH" | |
| export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}" | |
| _strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; } | |
| _strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; } | |
| autotst_on () { | |
| export AUTOTST_ROOT="__AUTOTST_PATH__" | |
| export AUTOTST_OLD_PATH="$PATH" | |
| export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}" |
arc/job/adapters/ts/crest.py
Outdated
| distances = dataframe.loc[hydrogen_key, heavy_atoms].sum() | ||
| occurrence_distances.append((occurrence, distances)) | ||
| except KeyError as e: | ||
| print(f"Error accessing distances for occurrence {occurrence}: {e}") |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use logger.error() instead of print() for consistency with the rest of the codebase. The logger is already imported and used elsewhere in this module.
| print(f"Error accessing distances for occurrence {occurrence}: {e}") | |
| logger.error(f"Error accessing distances for occurrence {occurrence}: {e}") |
|
|
||
| def _iter_level_keys_from_section(file_path: str, | ||
| section_start: str, | ||
| section_end: str) -> list[str]: |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type hint list[str] requires Python 3.9+. Since this project supports Python 3.7+ (as seen in environment files), you need to either use List[str] from typing (which is already imported on line 9) or add from __future__ import annotations at the top of the file.
| section_end: str) -> list[str]: | |
| section_end: str) -> List[str]: |
| return method.lower().replace('-', '') | ||
|
|
||
|
|
||
| def _split_method_year(method_norm: str): |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type annotation is missing for _split_method_year. Based on the docstring examples and implementation, it should return Tuple[str, Optional[int]].
arc/job/adapters/ts/autotst_ts.py
Outdated
| # try: | ||
| # from autotst.reaction import Reaction as AutoTST_Reaction | ||
| # except (ImportError, ModuleNotFoundError): | ||
| # HAS_AUTOTST = False | ||
|
|
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears to contain commented-out code.
| # try: | |
| # from autotst.reaction import Reaction as AutoTST_Reaction | |
| # except (ImportError, ModuleNotFoundError): | |
| # HAS_AUTOTST = False |
|
|
||
| import datetime | ||
| import itertools | ||
| import os |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'os' is not used.
| import os |
| else: | ||
| raise ValueError("No valid hydrogen atom found.") | ||
|
|
||
| return {} |
Check warning
Code scanning / CodeQL
Unreachable code Warning
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
The best fix is to delete the final unreachable return {} at line 378 in the get_h_abs_atoms function in arc/job/adapters/ts/crest.py. All execution paths in the function already either return with a dictionary or raise an exception. Removing this line does not alter any functionality, as it cannot be executed under any code path. No imports, method or variable definitions, or other changes are needed—just removal of this single line.
| @@ -375,4 +375,3 @@ | ||
| else: | ||
| raise ValueError("No valid hydrogen atom found.") | ||
|
|
||
| return {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 22 out of 23 changed files in this pull request and generated 12 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from arc.species.converter import (compare_zmats, relocate_zmat_dummy_atoms_to_the_end, zmat_from_xyz, zmat_to_xyz, | ||
| add_atom_to_xyz_using_internal_coords, sorted_distances_of_atom) | ||
| from arc.species.converter import ( | ||
| compare_zmats, | ||
| relocate_zmat_dummy_atoms_to_the_end, |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate import blocks found. Lines 30-38 import from arc.job.adapters.ts.crest, while lines 42-48 import from arc.species.converter. The second converter import block at lines 42-48 duplicates some imports already present earlier in the file (lines 40-41). Consider consolidating these imports to avoid confusion.
| from arc.species.converter import (compare_zmats, relocate_zmat_dummy_atoms_to_the_end, zmat_from_xyz, zmat_to_xyz, | |
| add_atom_to_xyz_using_internal_coords, sorted_distances_of_atom) | |
| from arc.species.converter import ( | |
| compare_zmats, | |
| relocate_zmat_dummy_atoms_to_the_end, | |
| from arc.species.converter import ( | |
| add_atom_to_xyz_using_internal_coords, | |
| compare_zmats, | |
| relocate_zmat_dummy_atoms_to_the_end, | |
| sorted_distances_of_atom, |
| # --- build CREST command string --- | ||
| # Example: crest coords.ref --cinp constraints.inp --noreftopo -T 8 | ||
| local_server = SERVERS.get("local", {}) | ||
| cpus = int(local_server.get("cpus", 8)) |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable cpus is defined but then not used consistently. Line 109 re-fetches the same value using local_server.get('cpus', 8) instead of using the cpus variable. Consider using the cpus variable for consistency.
| all_done = False | ||
| if all_done: | ||
| break | ||
| time.sleep(min(check_interval, 100)) |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded value 100 (seconds) in time.sleep appears to be a magic number without explanation. Consider extracting this as a named constant (e.g., MAX_CHECK_INTERVAL = 100) to clarify its purpose.
| if project_directory is not None and os.path.isfile(os.path.join(project_directory, xyz_str)): | ||
| xyz_str = os.path.join(project_directory, xyz_str) | ||
|
|
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic appears to handle file path resolution, but it's unclear and potentially incorrect. If xyz_str is already a valid file path in project_directory, it gets reassigned to the full path, but then this full path is used as if it's XYZ string content in subsequent processing. This seems like it would fail when trying to parse the file path as XYZ coordinates. Consider clarifying the intended behavior or removing this if it's not needed.
| if project_directory is not None and os.path.isfile(os.path.join(project_directory, xyz_str)): | |
| xyz_str = os.path.join(project_directory, xyz_str) | |
| if project_directory is not None: | |
| file_path = os.path.join(project_directory, xyz_str) | |
| if os.path.isfile(file_path): | |
| with open(file_path, 'r') as f: | |
| xyz_str = f.read() |
| crest_paths = list() | ||
| all_zmats = list() |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name crest_paths is ambiguous - it actually stores CREST job directories, not just paths. Consider renaming to crest_job_dirs or crest_job_paths for clarity.
| from arc.level import Level | ||
|
|
||
|
|
||
| AUTOTST_PYTHON = settings['AUTOTST_PYTHON'] |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for setting HAS_AUTOTST is incomplete. Line 22 sets HAS_AUTOTST = True unconditionally, then lines 29-30 may set it to False if AUTOTST_PYTHON is None. However, there's no check if AUTOTST_PYTHON is defined in settings. If settings doesn't have 'AUTOTST_PYTHON', this will raise a KeyError. Consider wrapping AUTOTST_PYTHON retrieval in a try-except block or using settings.get('AUTOTST_PYTHON').
| AUTOTST_PYTHON = settings['AUTOTST_PYTHON'] | |
| AUTOTST_PYTHON = settings.get('AUTOTST_PYTHON') |
|
|
||
| # --- activation hook ----------------------------------------------------- | ||
| cat <<'ACTHOOK' >"$act" | ||
| cat <<ACTHOOK >"$act" |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The heredoc delimiter ACTHOOK changed from single-quoted 'ACTHOOK' to unquoted ACTHOOK. This changes behavior - the unquoted version will perform variable expansion inside the heredoc. While this appears intentional to use the $repo variable, ensure this is the desired behavior and that all variables within the heredoc should be expanded at write-time rather than read-time.
| """ | ||
| Backwards compatible wrapper for reorder_xyz_string. | ||
| """ | ||
| warnings.warn("str_to_str was renamed to reorder_xyz_string", DeprecationWarning) |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deprecation warning should specify a version or timeline for removal, and should use stacklevel=2 to ensure the warning points to the caller's code rather than this wrapper. Consider: warnings.warn('str_to_str was renamed to reorder_xyz_string and will be removed in version X.Y', DeprecationWarning, stacklevel=2)
| warnings.warn("str_to_str was renamed to reorder_xyz_string", DeprecationWarning) | |
| warnings.warn( | |
| "str_to_str was renamed to reorder_xyz_string and will be removed in a future ARC release", | |
| DeprecationWarning, | |
| stacklevel=2, | |
| ) |
| xyz = str_to_xyz(xyz_guess_crest) | ||
| df_dmat = convert_xyz_to_df(xyz) | ||
| elif isinstance(xyz_guess_crest, list): | ||
| xyz_temp = "\n".join(xyz_guess_crest) | ||
| xyz_to_dmat = str_to_xyz(xyz_temp) | ||
| df_dmat = convert_xyz_to_df(xyz_to_dmat) |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable naming is inconsistent and confusing. Line 995 uses 'xyz', line 999 uses 'xyz_to_dmat' for the same purpose (result of str_to_xyz). The name 'xyz_to_dmat' is particularly misleading as it sounds like a function name. Consider using consistent naming like 'xyz_dict' for all str_to_xyz results.
| xyz = str_to_xyz(xyz_guess_crest) | |
| df_dmat = convert_xyz_to_df(xyz) | |
| elif isinstance(xyz_guess_crest, list): | |
| xyz_temp = "\n".join(xyz_guess_crest) | |
| xyz_to_dmat = str_to_xyz(xyz_temp) | |
| df_dmat = convert_xyz_to_df(xyz_to_dmat) | |
| xyz_dict = str_to_xyz(xyz_guess_crest) | |
| df_dmat = convert_xyz_to_df(xyz_dict) | |
| elif isinstance(xyz_guess_crest, list): | |
| xyz_temp = "\n".join(xyz_guess_crest) | |
| xyz_dict = str_to_xyz(xyz_temp) | |
| df_dmat = convert_xyz_to_df(xyz_dict) |
|
|
||
| return {} |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement is unreachable.
| return {} |
Addition of CREST Adapter that complements the heuristic adapter.